groups, in R statistical software, only the Tukey-Kramer test is available, and not Tukey’s HSD test
(as demonstrated later in this chapter in the section “Executing and interpreting post-hoc t tests”).
Scheffe’s test compares all pairs of groups, but also lets you bundle certain groups together if
doing so makes physical sense. For example, if you have two treatment groups and a control group
(such as Drug A, Drug B, and Control), you may want to determine whether either drug is different
from the control. In other words, you may want to test Drug A and Drug B as one group against the
control group, in which case you use Scheffe’s test. Scheffe’s test is the safest to use if you are
worried your analysis may be suffering from Type I error because it is the most conservative. On
the other hand, it is less powerful than the other tests, meaning it will miss a real difference in your
data more often than the other tests.
Running an ANOVA
Running a one-way ANOVA in R is similar to running an independent t test (see the earlier section
“Executing a t test”). However, in this case, we save the results as an object, and then run R code on
that object to get the output of our results.
Let’s turn back to the NHANES data. First, we need to prepare our grouping variable, which is the
three-level variable MARITAL (where 1 = married, 2 = never married, and 3 = all over marital
statuses). Next, we identify our dependent variable, which is our fasting glucose variable called
LBXGLU. Finally, we employ the aov command to run the ANOVA in R, and save the results in an
object called GLUCOSE_aov. We use the following code: GLUCOSE_aov <- aov(LBXGLU ~
as.factor(MARITAL), data = NHANES). (The reason we have to use the as.factor command on the
MARITAL variable is to make R handle it as an ordinal variable in the calculation, not a numeric one.)
Next, we can get our output by running a summary command on this object using this code:
summary(GLUCOSE_aov).
Interpreting the output of an ANOVA
We describe the R output here, but output from other statistical packages will have similar information.
The output begins with the variance table (or simply the ANOVA table). You can tell it is a table
because it looks like it has a column with no heading followed by columns with the following
headings: Df (for df), Sum Sq (for the sum of squares), Mean Sq (mean square), F value (value of F
statistic), and Pr(>F) (p value for the F test). You may recall that in order for an ANOVA test to be
statistically significant at α = 0.05, the p value on the F must be < 0.05. It is easy to identify that F =
12.59 on the output because it is labeled F value. But the p value on the F is labeled Pr(>F), and that’s
not very obvious. As you saw before, the p value is in scientific notation, but resolves to 0.00000353,
which is < 0.05, so it is statistically significant.
If you use R for this, you will notice that at the bottom of the output it says Signif. codes: 0
‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1. This is R explaining its coding system for p values. It
means that if a p value in output is followed by three asterisks, this is a code for < 0.001. Two
asterisks is a code for p < 0.01, and one asterisk indicates p < 0.05. A period indicates p < 0.1,
and no notation indicates the p value is greater than or equal to 0.1 — meaning by most standards,
it is not statistically significant at all. Other statistical packages often use similar coding to make
it easy for analysts to pick out statistically significant p values in the output.